OPEN 3 ACCESS Freely available online 



•0-PLOS I o-^E 



Buckley-James Estimator of AFT Models with Auxiliary ^ 
Covariates 



CrossMark 



Kevin Granville\ Zhaozhi Fan^* 



1 Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada, 2 Department of Mathematics and Statistics, Memorial University, St. 
John's, Newfoundland, Canada 



Abstract 

In this paper we study the Buckley-James estimator of accelerated failure time models with auxiliary covariates. Instead of 
postulating distributional assumptions on the auxiliary covariates, we use a local polynomial approximation method to 
accommodate them into the Buckley-James estimating equations. The regression parameters are obtained iteratively by 
minimizing a consecutive distance of the estimates. Asymptotic properties of the proposed estimator are investigated. 
Simulation studies show that the efficiency gain of using auxiliary information is remarkable when compared to just using 
the validation sample. The method is applied to the PBC data from the IVlayo Clinic trial in primary biliary cirrhosis as an 
illustration. 
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Introduction 

It is not uncommon to have one or more missing or 
mismeasured covariates in large coliort epidemiological studies. 
There are always cases in medical studies, where it is difficult to 
obtain an accurate measurement for all patients due to a 
procedure being too expensive or invasive. Alternatively, some 
auxiliary measurements which are less precise, but highly related 
to the target procedure, can be easily collected. In some situations, 
all of the measurements are error prone, while in other cases, a 
validation subsample, where the measurements are all accurately 
taken, is made available. 

The former is a pure measurement error problem. The purpose 
of this paper is to investigate the inference of the latter cases in a 
failure time setting. In some cases, the validation sample could be 
large enough on its own, so one could choose to ignore all data 
from subjects that have missing or mismeasured values for any of 
the covariates, with just a minor efficiency loss. However, if the 
validation sample is relatively small, utilizing the auxiliary 
information wiU lead to remarkable efficiency gain, as our 
simulation results wiU show. 

The literature on statistical inference of missing or mismeasured 
data of failure time regression models is abundant. Ignoring the 
measurement errors in modeling could lead to severe estimation 
bias, depending on the magnitude of the measurement error, 
hence invalidate the whole inference procedure ([1] Prentice, 
1982). See also [2] (Rubin, 1976), [3] (FuUer, 1987), [4] (Carrol et 
al., 1995), and [5] (Wang et al., 1998) among others. The negative 
influence of mismeasured or missing covariates is largely 
understood for the Cox proportional hazards model. But the 



same cannot be said of accelerated failure time models. Details 
about the Cox model can be seen in the work of [6] (Cox, 1972), 
[7] (Cox and Oakes, 1984), [8] (Kalbfleisch and Prentice, 2002), 
[9] (Hu et al., 1998), and [10] (Hu and Lin, 2002), and the 
references therein. See [11] (Zhou and Pepe, 1995), [12] (Zhou 
and Wang, 2000), [13] (Liu, Zhou and Cai, 2009), [14] (Fan and 
Wang, 2009), [15] (Liu, Wu and Zhou, 2010) censored survival 
models with auxiliary covariates. 

However, due to the direct physical interpretation of the AFT 
models, and the fact that AFT models are robust to model 
misspecification in the sense that ignoring a covariate wiU not lead 
to much bias in estimating the remaining regression coefficients, 
see [7] (Cox and Oakes, 1984), the biasing effect of covariate 
measurement error on AFT models deserves further investigation. 
A recent work on the subject of measurement error in AFT models 
was done by [16] (He et al, 2007), using a simulation and 
extrapolation approach. [17] (Yu and Nan 2010) studied the 
regression calibration approach within the semiparametric frame- 
work, assuming a known parametric relationship between the 
accurately measured covariates and their auxiliaries, up to a few 
unknown nuisance parameters. [18] (Granville and Fan, 2012) 
studied the parametric AFT models with auxiliary covariates 
based on maximum likelihood method. 

In this paper, we study the Buckley-James estimator [19] 
(Buckley and James, 1979) of AFT models with auxiliary 
covariates. The Buckley-James estimator was shown by [20] (Jin 
et al., 2006) to be consistent and asymptotically normal when using 
a consistent estimator as the initial value, due to its asymptotic 
linearity. Some other insights about the consistency and asymp- 
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totic theory of this estimator has been investigated by [21] (James 
and Smith, 1984) and [22] (Lai and Ying,1991), among others. We 
propose a local polynomial approximation method to handle the 

missing or mismeasured covariates, through the estimation of the 
conditional expectation of the unobservable estimating functions. 
This approach makes neither distributional assumptions about the 
model error term g,-, beyond it having mean zero and a finite 
variance, nor parametric assumptions on the relationship between 
the correctly measured covariates and their auxiliary variables. 
The proposed approach will be introduced through a kernel 
smoothing method, a special case of the local polynomial 
approximation, see [14] (Fan and Wang, 2009), mainly due to 
the ease of presentation. See [23] (Nadaraya, 1964), [24] (Watson, 
1964), and [25] (Wand and Jones, 1995) for details of kernel 
smoothing. Intensive simulation studies were conducted to 
investigate the small sample performance of our proposed method. 
The results show a remarkable efficiency gain over the method 
which ignores the auxiliary information. 

The remainder of this paper is organiz(xl as follows. In the 
second Section, we introduce Buckley-James estimator for the 
accelerated failure time model and present our estimation method. 
Then we investigate the asymptotic properties of our proposed 
estimator. The Section thereafter contains the results and 
discussion of our numerical studies, including simulations and 
the PBC data illustration. In the last Section, we put forth some 
concluding remarks. The proofs for Theorems were deferred to 
the appendix. 

Inference Methods of Accelerated Failure Time 
Model 

Let Ti and C,, i=\, . . . ,n be the failure and censoring times for 
the «th subject in a large study cohort. Due to the censoring, we 
observe 5*,= min(T',-,C,) as well as a failure indicator 
di = I{Ti<Ci). Let {Xi,Zi} denote the covariate vector where 
Xi is the component which is only observed in the validation set 
and Z,- is the component that is available for the full study cohort. 
Let Wi be the auxiliary covariate to Xi. Hence the data consists of 
the validation sample {Si, 5,-, Z,-, Xj, W,}, and the nonvalidation 
sample {S,-, 5,-, Z,, Wi}. In this paper we assume that Xi is scalar, 
mainly because of the simphcity of the presentation, and Z,- may 
be either a scalar or a vector. In practice, Xi could also be closely 
correlated with Z/. A special case is the classical measurement 
error model Wi = Xi + Ui, where C/, is the error encountered when 
measuring Xi. It is assumed that the {7,'s are independent and 
identically distributed random normal variables, Ui~ N{0,a^). Of 
the n observations, the validation sample contains ny observations, 
and the non-validation sample contains ny = n — ny obser\'ations. 

The accelerated failure time model based solely on the 
validation sample, can be expressed as 



Yi=\og{Ti) = PyXi + PiZi + ei, 



(1) 



where P' = (Pi,P2) is a vector of unknown regression coefficients 
and the 6i's are independent and identically distributed with an 
unspecified distribution F which has mean zero and finite 
variance. Equation (1) assumes automatically that Wi provides 
no additional information about the failure time, given {Xj, Zi}. 

Without making any assumption to the distribution of 6,-, the 
Buckley-James least squares procedure (Buckley and James, 1979) 
estimates the regression parameters through the minimization of 



J2{Yi-PiX.-l}^Zif. 
i=l 

The least squares estimates of and jij, are such that 

n 

Y,MYi-PiXi-p^Zi) = 0, (2) 



and 



Y,Zi{Yi-piXi-PiZi) = 0. 

!=1 



(3) 



In order to deal with censoring, let 
F; = Yidi + E[Yi\ Yi > \og(Ci)](l-5i). Then 

E[y,*]=ftjr,+;8^z„ so EE«^iJr,(r;-;6iJsr;-)Siz,)]=o, and 
B[j:UZi(Y;-piXi-i}2Zi)]=o. 

The estimators bi and b2 ot P then satisfy 



and 



J2^iiV-bi^i-b'2Zi) = 0, 



Y,ZiiY*-biXi-b'2Zi) = 0. 

!=1 



(4) 



(5) 



However, the distribution of 6,- is unknown. The distribution of 
Yi, and consequentiy, E[ F;| J; > log (C,)] are both unknown. The 
censored observations are hence replaced by 



Yi(b) = biXi + b'2Zi + 



l-Ftieiib)) 

where ei{b) = Si — b\Xi — b'2Zi, i=\,. . .,n are the residuals, and 



i:«(,)<e \n — l+ V J 



is the Kaplan-Meier Product Limit estimator of the distribution 
function of the residuals, F. Fi,(s) is a discrete function which will 
not tend to 1 as £ increases if the largest residual is censored. 

Therefore, following the convention of Buckley and James, the 
largest residual is redefined as uncensored for all calculations, if 
necessary. 

Let?; ib)= Yidi+Yiib)(\ -Si). The estimator for )S' = (;8i,j82) is 
the solution of the following equation. 



^(jr,-,zo'(Jr,-,z;) 



J2(x.,Zi)V{li) 



= Y(i5). (6) 



It should be noted that y{fl) depends on Xi, which is available 
only for the validation sample. For the non-validation sample we 
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can substitute the estimates of their conditional expectations given 
the auxiliary and other available covariates. The local polynomial 
approximation approach can be applied for this purpose, see [14] 
(Fan and Wang, 2009). For the simplicity of the presentation, we 
use the kernel smoothing method to estimate the conditional 
expectation of the unobserved covariates given the auxiliary 
information. 

Note that this simplification does not necessarily lead to 
efTicicncy loss. Since the direct estimation of the conditional 
expectation of the estimating function depends also on the 
Kaplan-Meier estimation of the survival function of the regression 
residuals, it could also introduce additional instability into the 
inference, as compared with imputing the estimated conditional 
expectation of the mismeasured covariate. Our simulation also 
revealed this observation (results not included). 

The conditional expectation of the mis-measured covariate, 
denoted by Xi, can be estimated as 

Ejerkh{Ti-rj) 

where Ti = (Wi,Zi)', ki,{-) = (hi ■ ■ ■ hj)-'^k{-/hi,- ■ ■ ,-/hd) is the 
kernel function and h = {h\, - ■ ■ ,hd)^ is the chosen vector of 
bandwidth. Using these X,'s in place of the «p missing X/'s, we 
may solve (6) for [i using a numerical method, like Broyden's 
method. Note that, Broyden's method requires two initial values, 
while the method of [21] (James and Smith, 1984) only requires a 
single initial value. In this function, y„{P(k)) is the value of (6) ^vhen 
calculated using p^^^y A very natural selection of the initial value of 
P is the least squares regression estimator calculated from the 
validation sample. 

The standard deviation of these estimators is estimated using 
bootstrapping. For R replicates, a simple random sample with 
replacement of the full sample size is taken from the obser\'ed data 
and the above estimation method for fi is repeated on each 
replicate. A sample standard deviation is then calculated to 
estimate the true standard deviation of the P estimators. 

Remark 1 In order to retain the same quality of information 
among the replicated estimations, an alternative method of 
resampling was allempled la keep the proportion of censored 
observations constant in each replication. We defined 
5* = YTi= 1 ^' ^"faZ number of observations with 

uncensored failure times. For R replicates, a simple random sample 
with replacement of size d* was taken from these uncensored 
observations, and a simple random sample with replacement of size 
n — S* was taken from the remaining censored observations. 
However this alternative method was found to underestimate the 
true standard deviations and resulted in coverage probabilities that 
were lower than the nominal level. The reason of this outcome is 
mostly due to the fact thai ihe independence of the censoring 
mechanism was broken by the sampling method. 

Defining a Solution 

In order to solve the estimating equations for the regression 
parameters, we use the iterative scheme of Pi^k+i) — ! n^fi{k))- 
However, as noted by [19] (Buckley and James, 1979) and [21] 
(James and Smith, 1984), these iterations need not converge. The 
Y(j6) function is discontinuous and piecewise linear in P so an exact 
solution may not exist. When this is the case, the iterations can 
oscillate between two values of /?. We define a possible alternate 
solution which is closest to satisfying P = y(P), or y(P) — P = 0. If 
j8(^) is oscillating between two points due to the lack of an exact 



solution, we define the alternate solution as Pg^^ that minimizes the 
modulus of this difference, 

min|/?(t+i)-/J(i)|. (8) 



When the iterations do not converge, a cut-oflF point has to be 
determined to stop the iterations. It is advised to select a number of 
iterations that is slightly greater than the amount required for the 
convergence when a solution typically does exist. For most of our 
simulations, we have set this point at A: = 11 . In many cases, our 
simulations converge in three or four steps. So at fc = 5 or A: = 6, 
P(k) breaks the loop when checked against P(k-\) for convergence, 
implying the solution being reached at iteration k—l. If the 
iteration does not converge, the first ten values are checked and 
whichever value, after, say 5 steps of iteration, satisfies equation (8) 
is selected as a solution. 

When dealing with real data, it is advised to choose an 
arbitrarily large number for the cut-ofiF point to find the best 
possible solution. 

Asymptotics 

In this section, we investigate the asymptotic properties of our 
proposed estimator. For that sake, we rewrite the estimating 
function and the Kaplan-Meier estimator of the residual survival 
function in the counting process frame work. Define a function 
U(b,P) by 

U(b,P)=Y,[iXi,Z[)-{X,Z,)\ 

i= 1 

{Y;{b) -Y;{b) - [{Xi,z[) - {x,z?,]p}> 

where i -^i /«, '^=Y^i=\^iln and 

y,*(^)= X]i'=i ^;'(^)/''- The estimating equation (6) can be 
rewritten as 

m,P)=m)=^- 

The Buckley-James estimator solves the above equation. 

When some of the covariates are accurately recorded only for 
the validation sample, but with relevant auxiliary information 
available for the whole study cohort, the estimating functions 
involved those mis-measured covariates belonging to the non- 
validation sample. We propose to estimate those terms by using 
the local polynomial approximation approach. Let n denote the 
size of whole study cohort, for ;'=!,...,« be the validation 
indicator. Define 

f^ = {XuZ'>|^^ + {%,Zl^{\-^^. 

The derived estimating equation is then 

f^(/^)= E[r/-r] \Y*{^)-Ym- [f,-r] 
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Our proposed estimator of the regression parameter, accom- 
modating the auxihary information, P solves this derived 
estimating equation. 

For a vector a, define 0®" = !, a®'=a, a®^=ad and 
||a|| = sup,- Let Yi{t) = I{ei(P)>t), for i=l,...,n and 

Y(t)=EU YM- i^t =(Xi,z'i)'ni+i^;,z'd'(\-nil where 

X*=E{Xi\Wi,Z'i) for ieV. 

Let = E"=i Wf^ 

%)(0 = «-'E"=i i-XOrf and 



s(k)it)= lim V r,®*P(e,>/|ro. 

1=1 



Denote further 5^(A:)(0 = «"' E"=i i^iWf^f 

5(^)(0 = «-' E"=l y,(Or;®* and 

4,(0= Jim r;®'^pfe>?|r,). 

1 = 1 



Without loss of generality, let d be the dimension of F,- in the 
definition of the local polynomial approximation. Suppose further 
that a is the order of the kernel function K, i.e. 

I u''K(u)du = 0, for q=\,2,- ■ ■ | u'^K(u)du^O, 

and J K^{u)du< + CO. The bandwidth conditions are given 

below. 

[BC] As «^oo, nh^'^-'-O, rf'^^oo. 

The following assumptions, beyond the bandwidth conditions, 
are necessary for the asymptotic properties of the proposed 
method. 

CO The hazard rate function X{t) of ei(fi) is such that 
J- 00 Kt)dt< 00. 

C.l There exists a constant B>Q, such that ||Z,|| <5, <B 

and II W^- II < 5 for all /=1,...,«. 
C.2 F has a twice-continuously diflferentiable density / such that 



/•OO 

t^dF(t)< CO and 

J — 00 



if'{i)lf{i)fdF{i)«^. 



C.3 The solution to U(fi) = 0 is unique and is an interior point 

of B, where B is a compact subset of R^. 
C.4 There exists a function n(t) such that, as «->oo. 



sup 

teR 



-n(t) 



•0. 



Remark 2 T/k; assumption C.3 is proposed jusl to simplify ihe 
proof of the asymptotics of the Buckley-James estimator. This could 
be violated due to the instability of the Kaplan-Meier estimator of 
the survival function when getting into the distribution tail. When 
this violation happens, the tail modification by [22] (Lai and Ying, 
1991) should he applied and the method of [26] (Jin. el al, 2006) 
of selecting a consistent and asymptotically normal estimator as the 
initial value can be adopted. 



Theorem 1 Under conditions C.0-C.4 and the bandwidth 

conditions [BC], n^^/^U(fi) converges in distribution to a zero- 
mean normal random vector with covariance matrix 
pS(;6) + (l-p)Si(jS), where p= lim„^oo «f/«, 



i:j(n-r,(,-S|^)...„[.«,« 



P 



where Q= lim„^oo^ J2jer Qh 



Qr 



E(Tj\Yj{i)=\yy 



■S(0)(0. 



[r,-£(r,|yxo=i,K)rj^, 

and 2i is defined as 

Theorem 2 Under assumptions C.0-C.4 and the bandwidth 
conditions [BC], n^'^I^U{fi) is asymptotically linear within the 
n~^l^ neighborhood of ft with probability 1, in the sense that 

n-^l^ U{b) = n-^I^U{p)-[pA + {\-p){A' + 0] 
if\\b — P\\<rr^l^, where the matrix A is defined as 



A = 

and A* as 

A*-- 



is))ds 



dX{t), 



■'(1) 



■S*(0)(0 



(\-Ffi{s))ds 



dX{t). 



Corollary 1 Under assumptions C.0-C.4, the bandwidth 
conditions [BC] and the assumption that A is nonsingular, the 
solution P to «^'/^£/(yS) = 0 converges in probability to jS. 

Theorem 3 Under assumptions C.0-C.4 and the bandwidth 

conditions [BC], 'Jn(ji — j')) converges in distribution to a zero mean 
normal random vector with covariance matrix 

[p^ + (l-p)(^* + 0]-'[pS(^) + 
(\-p)^,{ji)]\pA + {^-p){A* + Q)]-\ 



PLOS ONE I www.plosone.org 



4 



August 2014 I Volume 9 | Issue 8 j e104817 



Buckley-James Estimator of AFT Models 



Proof of Theorem 1 

Let V be the set of the indices of the validation sample, V that of 
the non-validation sample and )j,=/(/eK), i=\,...,n be the 
validation indicator. 

idr= r,/«F,f'^= E,eF «if* = E,eF r?/«K- 

Let 



r=Ji:('?.r,+(i-'/,)f,). 

( = 1 



Define 



and 



Then 



g;(iS)-e,(;?)=(i-'/,Mr,-n). 



Let F(-) and F'^{-) be the distribution functions of e and e, A 
and A'^ be their cumulative hazard functions. Then 



Mi{t) = dil{em<t)- Iiei(fi)>u)dAiu), 



and 



M^it) = dilieiifi}<t)- {' I{eim>u)dA\u), 

J —00 

are martingales witii respect to complete (T-field generated by 
6i, I{em<t), I(em<t), X„ W,, i=\,...,n. 



Further, 



A^(0=A(f+(i-^,F(r,-r*)), 



and 



Aff(0=M,.(r+(i->7,.Mr,-r*)). 



Let F^{-) and ^^^(0 be the (nominal) Kaplan-Meier estimators 
of ^"^(0 and F^(-). The estimated Buckley-James estimating 
function can be rewritten as 



t/(j6)=£fcr,.+(i-;7,)f,-f] 



l-F^iem 



J2hr.+ii-niWi-r] 



i=l 



n«) udF(u) 



l-Fiem 

QfD udFpiu) udF(u) 



;=1 

+ ^«(i-5,)fcr,+(i-»/,)f,-f] 
1=1 



= /i 4-/2+73-1-/4. 



By the martingale representation of Kaplan-Meier estimator of 
the survival function see [25] (Fleming and Harrington, 1991), also 
see [20] (Jin, et al., 2006), the bandwidth condition [BC] and the 
continuity of F (assumption C.2), the first two terms in the above 
equation can be rewritten as 



n (-co ^1 

h+h=Y. {[»;,r,+(i-^,)r,-fj 



i:WdF(t)\ 



\+lr{t)]dMi{t)+op{n'l^), 



where 



lrit)=n'' ('''r- + (1 - '?,)?, - f )(1 - <5,)4(0, 

! = 1 



and 



^,(0 = 



S{s)ds — + 



S(s)ds 



i(t>em 
xo ■ 



The Kaplan-Meier estimators of the survival functions of e,(;6) 
and 6,0?) lead to 



l-F^iem='^-Fpieim 



and 



udFg(u)- 



udF^iu) = -(l-^.)/Ji(jr,- -x;){\-Fft{em))- 
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Hence 



74 = - £ (1 - - ,j,)fcr, + (1 - ^,)f , - f] , - r*). 



,1/2 



JeV 



and 



;=1 



This term can be further rewritten as 



+ ■ 



Qn) udF{t)\ 



jeV ■> 



E{Tj\Yj{i)=\,V)- 



{Tj-E{Tj\ Yj{t)=\,V))'fi,^^^ +o,{\) 



Note \hdXUv{^) is a sum oinv i-i-d. terms hence central limit 
theorem applies. By conditions CO through C.4 and the 
martingale central limit theorem, U^ifl) converges in distribution 

to a normal random vector. Further, by independence ofUyifi) 
and U^ifi), we have 



1-p 



JeV 



u{p) ^ N{Q,p^m + (1 - my 



We have 



„l/2 



urn- 



1 " fco 

;;T7igJ_{k'+(i-'^<)r--f] 



+ 



1 1-p 

p 



jeV 



By assumptions CO, C2 and Lenglart's inequality, we have 
1 r ^ - _ 1 

^E (r,.-f-)-(r;-r*) 

" . 77 J — GO 

j^jn udF{r)\ 
Hence the estimating function can be rewritten as 



„l/2 



^l/2X^ 



r,-r] t 



1-F{e0)) 



Proof of Theorem 2 

From the equation (U) in the proof of Theorem 3.2, and by 
Theorem 4.1 of Lai and Ying (1991), we have, for \\b-P\\<n-^l^, 



1 



UUb)-- 



1 



„l/2 - V^"^ - „l/2 + (1 - - /J) + 0(1 + V^(^ - /})) 



with probability 1. The termUyib) consists of two parts, we have 

{pA + (1 - p)Q)Mb + 0(1 + Mb - m, 
with probability 1. Hence 



(p^ + (1 - + Q))Mb -P) + o{\ + Mb- P)), 

with probability 1. 

Corollary 1 and Theorem 3 are direct conclusions of Theorems 
1 and 2. 



Results of Numerical Studies 

Simulation Studies 

In this section we examine the small sample performance of our 
proposed estimator. Let fi^ denote our proposed estimator of the 
regression coefficients. Its small sample performance is compared 

with three alternative estimators: the validation estimator [Py) 
which is based solely on the validation sample; the naive estimator 
ififl), which ignores the measurement error by assuming that the 
unobser\'ed X,'s are equal to the observed IF,'s; and the complete 

case estimator {Pcr)> when we assume that X,- are observed for the 
whole study cohort. 
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Table 4. AFT model analysis of PBC data, smoothing for \og{ast). 



Covariate 




SD 


P-Value 


Pr 


SD 


P-Value 


Intercept 


15.5304 


2.5729 


1 .5792e-09 


16.1642 


2.3047 


2.3239e-12 


log(ast) 


-0.3783 


0.1926 


4.9482e-02 


-0.3364 


0.1805 


6.231 le-02 


age 


-0.0278 


0.0058 


1.8556e-06 


-0.0249 


0.0061 


3.9895e-05 


log(albumin) 


1 .4729 


0.5551 


7.9733e-03 


1.3926 


0.5883 


1.793 le-02 


log(bili) 


-0.4800 


0.0781 


7.7648e-10 


-0.4510 


0.0781 


7.7448e-09 


edemaOS 


-0.4387 


0.2124 


3.8858e-02 


-0.3006 


0.2221 


1.7593e-01 


edemal 


-0.9190 


0.2968 


1.9610e-03 


-0.9178 


0.3063 


2.7279e-03 


log{protime) 


-2.4323 


0.8712 


5.241 5e-03 


-2.8227 


0.7813 


3.0267e-04 


doi:1 0.1 371 /journal.pone.Ol 0481 7.t004 



The data for the simulations was generated in the following 
way. The Xi and Z,- are generated from a uniform distribution, 
uniform [0,5]. For each Xj, the auxiliary covariate is 
defined as Wi = Xi + Ui, where Uj is generated from a normal 
distribution with mean zero and standard deviation (7„. The value 
of (7„ determines the magnitude of the measurement error. The 
failure times were then defmed as 7",= exp{y,} where 
Yj = piXi + P2Zi + Ei. The e,'s were taken to be independent and 
identically distributed from either a standard normal, standard 
extreme value, or logistic distribution, respectively. 

Various other parameters are controlled over all simulations. 
Each run calculates 1000 replicates in the bootstrapping to give 
consistent estimators of the standard deviations. The parameters 
were chosen as /?' = (;8i ,/?2) = ( log (2), log(1.5)). Within a simu- 
lation, the censoring times are randomly generated from a uniform 
distribution with lower limit 0 and an appropriate upper limit to 
ensure an approximate 30% or 50% censor rate. The n and riy 
values are chosen to be either n = 400 and ny = 200, having half of 
the data in the validation set, or « = 250 and ny = 150, with the 
validation set containing 60% of the data. Finally, two values of (J,, 
are selected, f7„ =0.5, and cr„ = 0.8. For the kernel smoothing used 
to calculate jSj the Gaussian kernel function is selected, which has 
an order of 2, 



K(u)-- 



1 

2nh 



where u = (Wi — Wi)/h. We choose bandwidth h = 2aun as 
used by [12] (Zhou and Wang, 2000). 



The standard error (SE), standard deviation (SD), and coverage 
probability (CP) are calculated for each set of simulations. The SE 
values are the sample standard deviations of the f! estimates, the 
SD values are the mean standard deviations generated from the 
bootstrapping in each simulation, and CP is equal to the 
percentage of simulations that had the true ;8 value within a 
95% confidence interval around its estimate when using the result 
of the bootstrapping for the standard deviation. The results are 
presented in Tables 1, 2, and 3. 

From Tables 1, 2, and 3, we make the following observations: 

Estimators f^s and py are performing very well for each of 
the three error distributions. 

Naive estimator Pj^ is biased when the measurement error 
variance is large. 

The Ps estimator is more efficient than Py, having 
standard errors comparable to that of P^v- 
The proposed method removes the estimation bias in 
both for the regression coefficient of the error-prone 
covariate and that of the accurately measured covariates. 
The bootstrapping procedure results in good estimates of 
the standard error for all observed cases over the four 
estimators and three error distributions. 
The coverage probabilities for the 95% confidence 
intervals are very close to their nominal level, except for 
j6jv when f7„ is large, where the estimate is severely biased. 
) The model experiences the least variation when the error 
term follows the standard normal distribution, with 



Table 5. AFT model analysis of PBC data, smoothing for \og{copper). 



Covariate 




SD 


P-Value 




SD 


P-Value 


Intercept 


14.6413 


2.1482 


9.3809e-12 


15.1929 


1.8216 


O.OOOOe+00 


log(copper} 


-0.3299 


0.0883 


1 .8663e-04 


-0.3105 


0.0873 


3.7675e-04 


age 


-0.0250 


0.0061 


3.9593e-05 


-0.0217 


0.0061 


3.4084e-04 


log(albumin) 


1.4324 


0.5499 


9.1876e-03 


1.2576 


0.5783 


2.9666e-02 


log(bili) 


-0.4218 


0.0717 


3.9422e-09 


-0.4018 


0.0739 


5.3323e-08 


edemaOS 


-0.4285 


0.2160 


4.7314e-02 


-0.3097 


0.2226 


1.6422e-01 


edemal 


-0.9021 


0.3041 


3.0152e-03 


-0.9411 


0.3113 


2.5003e-03 


log(protime) 


-2.2738 


0.8185 


5.4687e-03 


-2.5324 


0.7294 


5.1651e-04 


doi:l 0.1 371 /journal.pone.Ol 0481 7.t005 
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standard errors that are approximately half the size as 
when the error term follows the chosen extreme value or 
logistic distributions, 
(viii). The efficiency gain is ignorable when a\ is small, such as 
0.2 or smaller (simulation results not reported). 

Application to PBC Data 

To illustrate how to use the smoothing method in practice, we 
analyze the data from the Mayo Clinic trial in primary biliary 
cirrhosis (PBC) of the liver. PBC is a chronic liver disease that 
inflames and slowly destroys the bile ducts in the Uver, impairing 
its ability to function properly. It is believed to be a type of 
autoimmune disorder where the immune system attacks the bile 
ducts. PBC occurs primarily in women, with approximately 90% 
of patients being women, most often between the ages of 40 and 
60. There is currently no known cure for the disease; the only 
known way to remove PBC is through a liver transplant, see 
[27,28]. 

In the Mayo Clinic trial, 418 patients were eligible. Of these 
patients, mostly complete data was obtained from the first 312 
patients. The remaining 106 patients were not part of the actual 
clinical trial but had some basic measurements taken and were 
followed for survival. The variables we used in our regression on 
the logarithm of time were age, patient's age (in years); albumin, 
serum albumin (in mg/dl); asi, aspartate aminotransferase (in U/ 
ml), once referred to as SOOT; hili, serum bilirunbin (in mg/ cU); 
copper, urine copper (ug/ day); edema, equal to 0 if no edema, 0.5 if 
untreated or successfully treated, or 1 if there exists edema despite 
diuretic therapy; protime, standardized blood clotting time. Of 
these, two cases were examined using either ast or copper for our 
X covariate to be smoothed due to incomplete data, while the 
others are mostly complete and thus arc included in Z. 

Edema was split into two categorical variables, edema05 and 
edemal, defined as 

f 1 , edema = 0.5, 
edemaQS = < 

(. 0 , otherwise, 

and 

{1 , edema = 1 , 
0 , otherwise. 

We also took the log transformation oi albumin, ast, hili, copper, 
and protime, in the interest of making their marginal distributions 
closer to normal. For the smoothing of the unobserved log (ast) 
and log (copper) values, log (hili) was chosen as the auxiliary 
covariate for both due to its high correlation (>0.5) with both 
variables. The bandwidth h was calculated using the sample 
standard deviation of log (bili), resulting in (7„ « 1.020551 and 
/! = 0.2734221 using the same formula as in the numerical 
simulations, A = 2(7„«^'''^. Any observations missing a value for 
either of the Z covariates were removed, leaving both cases with 
« = 4 1 6 while M = 3 1 2 for the model using log (ast) and ny = 310 
for the model using log (copper). 
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Discussion 

In this paper we proposed the use of the Buckley-James 
estimator as a nonparametric method of estimating the regression 
parameters of an accelerated failure time model with auxiliary 
covariates. Kernel smoothing was applied using the auxiliary 
covariates to estimate missing or mismeasured covariates. The 
Buckley-James method is then applied to the whole study cohort 
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The method should also perform well for the missing variable case 
given a sufficiently strong correlation. The method was applied to 
the PBC data as an illustration. 

The smoothing model is set up in a general format. In 
apphcations, we should only choose those variables which are 
highly related to the mismeasured one. By doing so we can avoid 
the situations such as the auxiliary covariates only occupy a 
narrow region, which could cause instability in the local 
smoothing, hence the whole model. 

Caution should also be taken when the proposed method is 
applied to a data with extremely small validation sample. A classic 
measurement error model might be a better option, where one can 
estimate the measurement error variance using the validation 
sample. 
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